Skip to content

refactor: bgz17_bridge.rs fully rewired to crate::simd::I32x16 Zero raw _mm512_/_mm256_/_mm_ intrinsics remaining. All 5 kernels rewired (92 intrinsics → 0): L1 distance: from_i16_slice → sub → abs → reduce_sum L1 weighted: same + from_array(WEIGHT_VEC) → mul Sign agreement: from_i16_slice → xor → cmpge_zero_mask XOR bind: from_i16_slice → xor → to_i16_array Inject noise: from_i16_slice → add → simd_min/max → to_i16_array AVX2 2-pass patterns collapsed: polyfill I32x16 absorbs the split internally (array-backed [i32; 16] on AVX2, native __m512i on AVX-512). LazyLock runtime dispatch preserved. #[target_feature] preserved. Scalar fallbacks untouched. 19/19 bgz17_bridge tests pass. 1514/1515 full suite pass (1 pre-existing timing flake in vml.rs). https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp#78

Merged
AdaWorldAPI merged 2 commits into
masterfrom
claude/setup-embedding-pipeline-Fa65C
Apr 3, 2026

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

No description provided.

claude added 2 commits April 3, 2026 18:09
target-cpu=x86-64-v4: native AVX-512, all SIMD inlined, no LazyLock overhead.
~24% faster than portable build. ONLY for AVX-512 hardware.

Select Dockerfile.avx512 in Railway dashboard for server deployment.
Default Dockerfile stays portable (AVX2 CI, LazyLock dispatch).

https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
Zero raw _mm512_/_mm256_/_mm_ intrinsics remaining.
All 5 kernels rewired (92 intrinsics → 0):

  L1 distance:      from_i16_slice → sub → abs → reduce_sum
  L1 weighted:      same + from_array(WEIGHT_VEC) → mul
  Sign agreement:   from_i16_slice → xor → cmpge_zero_mask
  XOR bind:         from_i16_slice → xor → to_i16_array
  Inject noise:     from_i16_slice → add → simd_min/max → to_i16_array

AVX2 2-pass patterns collapsed: polyfill I32x16 absorbs the split
internally (array-backed [i32; 16] on AVX2, native __m512i on AVX-512).

LazyLock runtime dispatch preserved. #[target_feature] preserved.
Scalar fallbacks untouched. 19/19 bgz17_bridge tests pass.
1514/1515 full suite pass (1 pre-existing timing flake in vml.rs).

https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@AdaWorldAPI AdaWorldAPI merged commit e3f6bce into master Apr 3, 2026
5 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants